719 research outputs found

    DSPatch: Dual Spatial Pattern Prefetcher

    Full text link
    High main memory latency continues to limit performance of modern high-performance out-of-order cores. While DRAM latency has remained nearly the same over many generations, DRAM bandwidth has grown significantly due to higher frequencies, newer architectures (DDR4, LPDDR4, GDDR5) and 3D-stacked memory packaging (HBM). Current state-of-the-art prefetchers do not do well in extracting higher performance when higher DRAM bandwidth is available. Prefetchers need the ability to dynamically adapt to available bandwidth, boosting prefetch count and prefetch coverage when headroom exists and throttling down to achieve high accuracy when the bandwidth utilization is close to peak. To this end, we present the Dual Spatial Pattern Prefetcher (DSPatch) that can be used as a standalone prefetcher or as a lightweight adjunct spatial prefetcher to the state-of-the-art delta-based Signature Pattern Prefetcher (SPP). DSPatch builds on a novel and intuitive use of modulated spatial bit-patterns. The key idea is to: (1) represent program accesses on a physical page as a bit-pattern anchored to the first "trigger" access, (2) learn two spatial access bit-patterns: one biased towards coverage and another biased towards accuracy, and (3) select one bit-pattern at run-time based on the DRAM bandwidth utilization to generate prefetches. Across a diverse set of workloads, using only 3.6KB of storage, DSPatch improves performance over an aggressive baseline with a PC-based stride prefetcher at the L1 cache and the SPP prefetcher at the L2 cache by 6% (9% in memory-intensive workloads and up to 26%). Moreover, the performance of DSPatch+SPP scales with increasing DRAM bandwidth, growing from 6% over SPP to 10% when DRAM bandwidth is doubled.Comment: This work is to appear in MICRO 201

    Exploring collaboration in challenging environments: from the car to the factory and beyond

    Get PDF
    We propose a daylong workshop at CSCW2012 on the topic collaboration in challenging and dicult environments, which are to our understanding all contexts, which go beyond traditional working/oce settings topic. Examples for these environments can be the automotive context or the context of a semiconductor factory, which show very specic contextual conditions and therefore oer special research challenges: How to address all passengers in the car, not only the driver? How to explore operator tasks in a cleanroom? How could the long-term (social) collaboration of robots and humans be investigated in privacy critical environments

    Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

    Full text link
    Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS's page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium on Memory Managemen

    Mutation of Directed Graphs -- Corresponding Regular Expressions and Complexity of Their Generation

    Full text link
    Directed graphs (DG), interpreted as state transition diagrams, are traditionally used to represent finite-state automata (FSA). In the context of formal languages, both FSA and regular expressions (RE) are equivalent in that they accept and generate, respectively, type-3 (regular) languages. Based on our previous work, this paper analyzes effects of graph manipulations on corresponding RE. In this present, starting stage we assume that the DG under consideration contains no cycles. Graph manipulation is performed by deleting or inserting of nodes or arcs. Combined and/or multiple application of these basic operators enable a great variety of transformations of DG (and corresponding RE) that can be seen as mutants of the original DG (and corresponding RE). DG are popular for modeling complex systems; however they easily become intractable if the system under consideration is complex and/or large. In such situations, we propose to switch to corresponding RE in order to benefit from their compact format for modeling and algebraic operations for analysis. The results of the study are of great potential interest to mutation testing

    An Approximate Dynamic Programming Approach to Urban Freight Distribution with Batch Arrivals

    Get PDF
    We study an extension of the delivery dispatching problem (DDP) with time windows, applied on LTL orders arriving at an urban consolidation center. Order properties (e.g., destination, size, dispatch window) may be highly varying, and directly distributing an incoming order batch may yield high costs. Instead, the hub operator may wait to consolidate with future arrivals. A consolidation policy is required to decide which orders to ship and which orders to hold. We model the dispatching problem as a Markov decision problem. Dynamic Programming (DP) is applied to solve toy-sized instances to optimality. For larger instances, we propose an Approximate Dynamic Programming (ADP) approach. Through numerical experiments, we show that ADP closely approximates the optimal values for small instances, and outperforms two myopic benchmark policies for larger instances. We contribute to literature by (i) formulating a DDP with dispatch windows and (ii) proposing an approach to solve this DDP

    Improving Phase Change Memory Performance with Data Content Aware Access

    Full text link
    A prominent characteristic of write operation in Phase-Change Memory (PCM) is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten. We observe that overwriting unknown memory content can incur significantly higher latency and energy compared to overwriting known all-zeros or all-ones content. This is because all-zeros or all-ones content is overwritten by programming the PCM cells only in one direction, i.e., using either SET or RESET operations, not both. In this paper, we propose data content aware PCM writes (DATACON), a new mechanism that reduces the latency and energy of PCM writes by redirecting these requests to overwrite memory locations containing all-zeros or all-ones. DATACON operates in three steps. First, it estimates how much a PCM write access would benefit from overwriting known content (e.g., all-zeros, or all-ones) by comprehensively considering the number of set bits in the data to be written, and the energy-latency trade-offs for SET and RESET operations in PCM. Second, it translates the write address to a physical address within memory that contains the best type of content to overwrite, and records this translation in a table for future accesses. We exploit data access locality in workloads to minimize the address translation overhead. Third, it re-initializes unused memory locations with known all-zeros or all-ones content in a manner that does not interfere with regular read and write accesses. DATACON overwrites unknown content only when it is absolutely necessary to do so. We evaluate DATACON with workloads from state-of-the-art machine learning applications, SPEC CPU2017, and NAS Parallel Benchmarks. Results demonstrate that DATACON significantly improves system performance and memory system energy consumption compared to the best of performance-oriented state-of-the-art techniques.Comment: 18 pages, 21 figures, accepted at ACM SIGPLAN International Symposium on Memory Management (ISMM

    Metamaterial Polarization Converter Analysis: Limits of Performance

    Full text link
    In this paper we analyze the theoretical limits of a metamaterial converter that allows for linear-to- elliptical polarization transformation with any desired ellipticity and ellipse orientation. We employ the transmission line approach providing a needed level of the design generalization. Our analysis reveals that the maximal conversion efficiency for transmission through a single metamaterial layer is 50%, while the realistic re ection configuration can give the conversion efficiency up to 90%. We show that a double layer transmission converter and a single layer with a ground plane can have 100% polarization conversion efficiency. We tested our conclusions numerically reaching the designated limits of efficiency using a simple metamaterial design. Our general analysis provides useful guidelines for the metamaterial polarization converter design for virtually any frequency range of the electromagnetic waves.Comment: 10 pages, 11 figures, 2 table
    corecore